import SetupEnv from './common/setup-env.mdx';
import PrepareIOS from './common/prepare-ios.mdx';

# Integrate with iOS (WebDriverAgent)

After connecting iOS devices using WebDriverAgent, you can use Midscene javascript SDK to control iOS devices.

import { PackageManagerTabs } from '@theme';

:::info Demo Projects
Control iOS devices with javascript SDK: [https://github.com/web-infra-dev/midscene-example/blob/main/ios/javascript-sdk-demo](https://github.com/web-infra-dev/midscene-example/blob/main/ios/javascript-sdk-demo)

Integrate with Vitest for testing: [https://github.com/web-infra-dev/midscene-example/tree/main/ios/vitest-demo](https://github.com/web-infra-dev/midscene-example/tree/main/ios/vitest-demo)
:::

:::info Showcases

[More showcases](./blog-support-ios-automation.mdx)

<p align="center">
  <img src="/ios.png" alt="ios" width="400" />
</p>

:::

## About WebDriver and Midscene's Relationship

WebDriver is a standard protocol established by W3C for browser automation, providing a unified API to control different browsers and applications. The WebDriver protocol defines the communication method between client and server, enabling automation tools to control various user interfaces across platforms.

Through the efforts of the Appium team and other open source communities, the industry now has many excellent libraries that convert desktop and mobile device automation operations into WebDriver protocol. These tools include:
- **Appium** - Cross-platform mobile automation framework
- **WebDriverAgent** - Service dedicated to iOS device automation
- **Selenium** - Web browser automation tool
- **WinAppDriver** - Windows application automation tool

**Midscene adapts to the WebDriver protocol**, which means developers can use AI models to perform intelligent automated operations on any device that supports WebDriver. Through this design, Midscene can not only control traditional operations like clicking and typing, but also:
- Understand interface content and context
- Execute complex multi-step operations
- Perform intelligent assertions and validations
- Extract and analyze interface data

On iOS platform, Midscene connects to iOS devices through WebDriverAgent, allowing you to control iOS apps and system using natural language descriptions.

<PrepareIOS />

<SetupEnv />

## Integrate Midscene

### Step 1: Install dependencies

<PackageManagerTabs command="install @midscene/ios --save-dev" />

### Step 2: Write scripts

Here's an example using iOS Safari browser to search for headphones.

Write the following code and save it as `./demo.ts`

```typescript title="./demo.ts"
import {
  IOSAgent,
  IOSDevice,
  agentFromWebDriverAgent,
} from '@midscene/ios';

const sleep = (ms) => new Promise((r) => setTimeout(r, ms));
Promise.resolve(
  (async () => {
    // Method 1: Create device and agent directly
    const page = new IOSDevice({
      wdaPort: 8100,
      wdaHost: 'localhost',
    });

    // 👀 Initialize Midscene agent
    const agent = new IOSAgent(page, {
      aiActionContext:
        'If any location, permission, user agreement, etc. popup appears, click agree. If login page appears, close it.',
    });
    await page.connect();

    // Method 2: Or use convenience function (recommended)
    // const agent = await agentFromWebDriverAgent({
    //   wdaPort: 8100,
    //   wdaHost: 'localhost',
    //   aiActionContext: 'If any location, permission, user agreement, etc. popup appears, click agree. If login page appears, close it.',
    // });

    // 👀 Directly open ebay.com webpage (recommended approach)
    await page.launch('https://ebay.com');
    await sleep(3000);

    // 👀 Enter keywords and perform search
    await agent.aiAction('Search for "Headphones"');

    // 👀 Wait for loading to complete
    await agent.aiWaitFor('At least one headphone product is displayed on the page');
    // Or you can use a simple sleep:
    // await sleep(5000);

    // 👀 Understand page content and extract data
    const items = await agent.aiQuery(
      '{itemTitle: string, price: Number}[], find product titles and prices in the list',
    );
    console.log('Headphone product information', items);

    // 👀 Use AI assertion
    await agent.aiAssert('Multiple headphone products are displayed on the interface');

    await page.destroy();
  })(),
);
```

### Step 3: Run

Use `tsx` to run the script

```bash
# run
npx tsx demo.ts
```

Shortly after, you will see output like this:

```log
[
 {
   itemTitle: 'AirPods Pro (2nd generation) with MagSafe Charging Case (USB-C)',
   price: 249
 },
 {
   itemTitle: 'Sony WH-1000XM4 Wireless Premium Noise Canceling Overhead Headphones',
   price: 278
 }
]
```

### Step 4: View execution report

When the above command executes successfully, it will output in the console: `Midscene - report file updated: /path/to/report/some_id.html`. Open this file in a browser to view the report.

## Constructor and Interface

### `IOSDevice` Constructor

The IOSDevice constructor supports the following parameters:

- `opts?: IOSDeviceOpt` - Optional parameters for IOSDevice configuration
  - `wdaPort?: number` - Optional, WebDriverAgent port. Default is 8100.
  - `wdaHost?: string` - Optional, WebDriverAgent host. Default is 'localhost'.
  - `autoDismissKeyboard?: boolean` - Optional, whether to automatically dismiss keyboard after text input. Default is true.
  - `customActions?: DeviceAction<any>[]` - Optional, list of custom device actions.

### Additional iOS Agent Interfaces

In addition to the common Agent interfaces in [API Reference](./api.mdx), IOSAgent provides some additional interfaces:

### `agent.launch()`

Launch a web page or native iOS application.

- Type

```typescript
function launch(uri: string): Promise<void>;
```

- Parameters:

  - `uri: string` - URI to open, can be a web url, native app bundle identifier, or custom URL scheme

- Return Value:

  - `Promise<void>`

- Example:

```typescript
import { IOSAgent, IOSDevice, agentFromWebDriverAgent } from '@midscene/ios';

// Method 1: Create device and agent manually
const page = new IOSDevice();
const agent = new IOSAgent(page);
await page.connect();

// Method 2: Use convenience function (recommended)
const agent = await agentFromWebDriverAgent();

await agent.launch('https://www.apple.com'); // Open web page
await agent.launch('com.apple.mobilesafari'); // Launch Safari
await agent.launch('com.apple.Preferences'); // Launch Settings app
await agent.launch('myapp://profile/user/123'); // Open app deep link
await agent.launch('tel:+1234567890'); // Make a phone call
await agent.launch('mailto:example@email.com'); // Send an email
```

### `agentFromWebDriverAgent()` (Recommended)

Create an IOSAgent by connecting to WebDriverAgent service. This is the most convenient way.

- Type

```typescript
function agentFromWebDriverAgent(
  opts?: PageAgentOpt & IOSDeviceOpt,
): Promise<IOSAgent>;
```

- Parameters:

  - `opts?: PageAgentOpt & IOSDeviceOpt` - Optional, configuration for initializing IOSAgent. PageAgentOpt refers to [Constructor](./api.mdx), IOSDeviceOpt configuration values refer to [IOSDevice Constructor](./integrate-with-ios#iosdevice-constructor)

- Return Value:

  - `Promise<IOSAgent>` Returns an IOSAgent instance

- Example:

```typescript
import { agentFromWebDriverAgent } from '@midscene/ios';

// Use default WebDriverAgent address (localhost:8100)
const agent = await agentFromWebDriverAgent();

// Use custom WebDriverAgent address
const agent = await agentFromWebDriverAgent({
  wdaHost: 'localhost',
  wdaPort: 8100,
  aiActionContext: 'If popups appear, click agree',
});
```



## Extending Custom Interaction Actions

Using the `customActions` option combined with custom interaction actions defined by `defineAction`, you can extend the Agent's action space. These actions are appended after built-in actions, making them available for the Agent to call during planning.

```typescript
import { getMidsceneLocationSchema, z } from '@midscene/core';
import { defineAction } from '@midscene/core/device';
import { IOSAgent, IOSDevice } from '@midscene/ios';

const ContinuousClick = defineAction({
  name: 'continuousClick',
  description: 'Click the same target repeatedly',
  paramSchema: z.object({
    locate: getMidsceneLocationSchema(),
    count: z
      .number()
      .int()
      .positive()
      .describe('How many times to click'),
  }),
  async call(param) {
    const { locate, count } = param;
    console.log('click target center', locate.center);
    console.log('click count', count);
    // Implement custom click logic combining locate + count
  },
});

const agent = await agentFromWebDriverAgent({
  customActions: [ContinuousClick],
});

await agent.aiAction('Click the red button five times');
```

For more details about custom actions, refer to [Integrate with any interface](./integrate-with-any-interface).

## More

- For more Agent API interfaces, refer to [API Reference](./api.mdx).
- For more prompting tips, refer to [Prompting Tips](./prompting-tips)

## FAQ

### Why can't I control my device through WebDriverAgent even though it's connected?

Please check the following:

1. **Developer Mode**: Ensure it's enabled in Settings > Privacy & Security > Developer Mode
2. **UI Automation**: Ensure it's enabled in Settings > Developer > UI Automation
3. **Device Trust**: Ensure the device trusts the current Mac

### What are the differences between simulators and real devices?

| Feature | Real Device | Simulator |
|---------|-------------|-----------|
| Port Forwarding | Requires iproxy | Not required |
| Developer Mode | Must enable | Auto-enabled |
| UI Automation Settings | Must enable manually | Auto-enabled |
| Performance | Real device performance | Depends on Mac performance |
| Sensors | Real hardware | Simulated data |

### How to use custom WebDriverAgent port and host?

You can specify WebDriverAgent port and host through the IOSDevice constructor or agentFromWebDriverAgent:

```typescript
// Method 1: Using IOSDevice
const device = new IOSDevice({
  wdaPort: 8100,        // Custom port
  wdaHost: '192.168.1.100', // Custom host
});

// Method 2: Using convenience function (recommended)
const agent = await agentFromWebDriverAgent({
  wdaPort: 8100,        // Custom port
  wdaHost: '192.168.1.100', // Custom host
});
```

For remote devices, you also need to set up port forwarding accordingly:

```bash
iproxy 8100 8100 YOUR_DEVICE_ID
```

## iOS-Specific Actions

The iOS package includes iOS-specific actions that can be used in automation:

```typescript
// Press home button
await agent.callAction('IOSHomeButton');

// Open app switcher
await agent.callAction('IOSAppSwitcher');

// Long press with custom duration
await agent.callAction('IOSLongPress', {
  locate: 'menu item',
  duration: 2000, // 2 seconds
});
```

## Best Practices

### 1. Device Management

Always properly connect and destroy devices:

```typescript
try {
  await device.connect();
  // Your automation code here
} finally {
  await device.destroy();
}
```

### 2. Wait for UI Updates

iOS animations and transitions may need time to complete:

```typescript
await agent.aiTap('button');
await sleep(1000); // Wait for animation
await agent.aiAssert('new screen loaded');
```

### 3. Handle Keyboard Input

For better text input handling:

```typescript
await agent.aiInput('text', 'input field', {
  autoDismissKeyboard: true, // Automatically dismiss keyboard
});
```

### 4. Bundle Identifiers

Common iOS app bundle identifiers:

- Safari: `com.apple.mobilesafari`
- Settings: `com.apple.Preferences`
- Messages: `com.apple.MobileSMS`
- Camera: `com.apple.camera`
- Photos: `com.apple.mobileslideshow`

## Testing Integration

### Vitest Integration

```typescript title="test/ios.test.ts"
import { describe, it, beforeAll, afterAll } from 'vitest';
import { IOSDevice, IOSAgent } from '@midscene/ios';

describe('iOS App Tests', () => {
  let device: IOSDevice;
  let agent: IOSAgent;

  beforeAll(async () => {
    device = new IOSDevice();
    agent = new IOSAgent(device);
    await device.connect();

    // Or use the convenience function (recommended):
    // agent = await agentFromWebDriverAgent();
  });

  afterAll(async () => {
    await device.destroy();
  });

  it('should launch Safari and navigate', async () => {
    await device.launch('com.apple.mobilesafari');
    await agent.aiAssert('Safari is open');
  });
});
```

## Troubleshooting

### WebDriverAgent Connection Issues

If you encounter WebDriverAgent connection issues:

1. **Check port forwarding**:
   ```bash
   lsof -i:8100  # Should show iproxy process
   ```

2. **Rebuild WebDriverAgent**:
   ```bash
   # The iOS package will automatically rebuild when needed
   ```

3. **Check device trust**:
   - Ensure your Mac is trusted on the iOS device
   - Check Developer Mode is enabled

### Common Errors

**"Device not found"**:
- Verify device is connected via USB
- Check Device Id with `idevice_id -l`
- Ensure port forwarding is active

**"WebDriverAgent session failed"**:
- Restart port forwarding
- Check if WebDriverAgent is running on device
- Verify development team configuration

**"Element not found"**:
- Use more descriptive element descriptions
- Wait for UI animations to complete
- Check if element is visible on screen

## Next Steps

- Explore [API Reference](./api) for complete method documentation
- Check out [Prompting Tips](./prompting-tips) for better AI interactions
- Learn about [Model Configuration](./model-provider) for optimal performance